6  Further Topics

6.1 Numerical Solutions to Partial Differential Equations

6.2 Random Number Generation

How do we generate random numbers on a computer?

The ability to simulate random numbers on a computer is essential for modeling and understanding systems where chance and variability play a central role. Robust and repeatable random number generation is thus central to statistics, finance, queuing systems, cryptography, and a whole host of algorithms and applications.

But what exactly do we mean by randomness or random numbers? For our purposes, we will consider “random numbers” to mean sequences of numbers that mimic the statistical properties of realisations of random variables drawn with a given distribution.

6.3 Uniform Random Numbers

If we know how to create a sample from a uniform distribution, then we can obtain a sample with a given distribution \(F\) using the following result:

Theorem 6.1: The Inverse Transform Method
Let \(U\) be a uniformly distributed random variable, and let \(F(x)\) be a c.d.f. If \(F\) is invertible, then \(X=F^{-1}(U)\) is distributed according to \(F\).

Therefore it is often enough to have samples from the uniform distribution and hence this distribution plays a central role in the theory of random number generation.

6.3.1 On creating uniform samples

In some sense, there are no truly random number generators. Computers can only execute algorithms, which are deterministic instructions, and thus they can only yield samples which appear random. We call these numbers pseudorandom numbers, and the algorithms which produce these numbers are called pseudorandom number generators.

The theoretical wish

A generator of genuine random numbers is an algorithm that produces a sequence of random variables \(U_1\), \(U_2\), \(\ldots\) which satisfies:

  1. Each \(U_i\) is uniformly distributed between \(0\) and \(1\).
  2. The \(U_i\) are mutually independent.

Property (ii) is the more important one since the normalisation in (i) is convenient but not crucial. Property (ii) implies that all pairs of values are uncorrelated and, more generally, that the value of \(U_i\) should not be predictable from \(U_1,\ldots, U_{i-1}\). Of course, the properties listed above are those of authentically random numbers; the goal is to come as close as possible to these properties with our artificially generated pseudorandom numbers.

6.3.2 Linear Congruential Generators

An important and simple class of generators are the linear congruential generators, abbreviated as LCGs.
We need the modulo operation in order to define this class of generators.

Definition 6.1
For nonnegative integers \(x\) and \(m\), we call \(y=x \;\text{mod}\;m\) the integer remainder from the division \(x/m\); we will write this as \[ y = x \mod m. \]

Definition 6.2
A linear congruential generator (LCG) is an iteration of the form \[ \begin{aligned} x_{i+1} &= (a x_i + c) \mod m \\ u_{i+1} &= \frac{x_{i+1}}{m} \in (0,1), \end{aligned} \] where \(a\), \(c\), and \(m\) are integers.

Example 6.1
Choose \(a=6\), \(c=0\), and \(m=11\). Starting from \(x_0=1\), which is called the seed, gives \[ 1, 6, 3, 7, 9, 10, 5, 8, 4, 2, 1, 6, \ldots \] Choosing \(a=3\) yields the sequence \[ 1, 3, 9, 5, 4, 1, \ldots \] whereas the seed \(x_0 = 2\) results in \[ 2, 6, 7, 10, 8, 2, \ldots \]

Conditions for a Full Period I

Theorem 6.2
Suppose \(c\neq 0\). The generator has full period (that is, the number of distinct values generated from any seed \(x_0\) is \(m-1\)) if and only if the following conditions hold:

  1. \(c\) and \(m\) are relatively prime (their only common divisor is \(1\)).

  2. Every prime number that divides \(m\) divides \(a-1\) as well.

  3. \(a-1\) is divisible by \(4\) if \(m\) is.

If \(m\) is a power of \(2\), the generator has full period if \(c\) is odd and \(a=4n+1\) for some integer \(n\).

Example 6.2
The Borland C++ LCG has parameters \[ m=2^{32},\quad a=22695477=1+4\times 5673869,\quad c=1. \] Hence, by the corollary above, the LCG has full period.

Conditions for a Full Period II

If \(c=0\) and \(m\) is a prime, then full period is achieved from any \(x_0 \not = 0\) when

  1. \(a^{m-1} - 1\) is a multiple of \(m\),
  2. \(a^j - 1\) is not a multiple of \(m\) for \(j=1, \ldots , m-2\).

If \(a\) satisfies these two properties it is called a primitive root of \(m\). In this situation, the sequence \(\{ x_i \}_{i \geq 1}\) is of the form \[ x_0 , a x_0 , a^2 x_0 , a^3 x_0 , \ldots \quad \mod m, \] given that \(c=0\). The sequence returns to \(x_0\) for the first time for the smallest \(k\) which satisfies \(a^k x_0 \mod m = x_0\). This is the smallest \(k\) for which \(a^k \mod m = 1\), that is \(a^k - 1\) is a multiple of \(m\).

Hence, the definition of a prime root coincides with the requirement that the series does not return to \(x_0\) until \(a^{m-1} x_0\).

Example 6.3: Examples of LCG Parameters
Modulus \(m\) Multiplier \(a\) Reference
\(2^{31} - 1\) \(16807\) Lewis, Goodman, and Miller, Park and Miller
\((=2147483647)\) \(39373\) L’Ecuyer
\(742938285\) Fishman and Moore
\(950706376\) Fishman and Moore
\(1226874159\) Fishman and Moore
\(2147483399\) \(40692\) L’Ecuyer
\(2147483563\) \(40014\) L’Ecuyer

Exercise 6.1
Define an LCG with parameters \[ c=0,\quad m=2^3-1=7,\quad a=3. \] Does this LCG have full period?

Earlier versions of Microsoft Excel (prior to 2003) relied on a linear congruential generator (LCG) for its RAND() function, which produced predictable and statistically non-random outputs that undermined the reliability of numerical simulations, Monte Carlo methods, and statistical sampling. This flaw was widely recognized in the simulation community during the late 1990s and early 2000s, leading Microsoft to update Excel’s random number algorithm in the 2003 release.

In applications, the following considerations are typically the most important when considering the appropriateness of a random number generating scheme:

  • Reproducibility
  • Speed
  • Portability
  • Period Length (if any)
  • Randomness (i.e. robust statistical properties)

Linear congruential generators are no longer (and have not been for some time) used practically, although they are a useful illustration of pseudorandom number generation. The Mersenne Twister family of algorithms is among the most popular in practice, and modern implementations are appropriate for most applications. It is notable for its extremely long period (\(2^{19937} - 1\)), strong statistical properties, and fast performance (see Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator by Matsumoto and Nishimura, 1998).

6.4 Testing uniformity

Previously, we have only checked for a full period, to see whether a pseudorandom number generator is reasonable or not. In this section, we demonstrate more elaborate means to test the quality of a pseudorandom number generator.

Given a sample from a supposedly uniform distribution, one can use statistical tests to reject the hypothesis of uniformity. The samples provided by a computer are fake since they are totally deterministic, and therefore not random (as such they cannot be uniformly distributed). However, they are so “well chosen”, that they might appear random. Hence we require statistical tests of randomness in order to judge the quality of a candidate PRNG.

6.4.1 Chi-Squared Test

Definition 6.3
Let \(k\geq 1\), and let \(X_1,\dots,X_k\) be a sequence of i.i.d. standard normally distributed random variables. The distribution of the sum of squares \[ S=X_1^2+\dots+X_k^2 \] is called chi-square with \(k\) degrees of freedom.

The probability density function of \(\chi(k)\) is given by \[ f(x,k)=\frac{1}{2^{k/2}\Gamma(k/2)}x^{k/2-1}e^{-x/2}, \] where \(\Gamma\) denotes the gamma function, which is defined by \[ \Gamma(\xi)=\int_0^\infty x^{\xi-1}e^{-x}\,\mathrm{d}x. \] For integers \(n \in \mathbb{N}\), we can write the Gamma function in terms of the factorial function as follows: \[ \Gamma(n)=(n-1)!=(n-1)(n-2)\dots 2\cdot 1. \]

Let \[ X_1,X_2,\dots, X_n \] be a sample.

The chi-squared test for uniformity is constituted by the following: - The null hypothesis \(H_0\): The sample is from a uniform distribution against the alternative hypothesis \(H_a\) that it is not. - The test statistic \[ T=\frac{k}{n}\sum_{j=1}^k \left ( n_j-\frac{n}{k}\right)^2, \] where \(k\) is the number of equidistant partitions (“bins”, to be chosen) of the unit interval, given by \[ [0,1/k),\quad [1/k, 2/k),\dots, [(k-1)/k,1]. \] and \(n_j\) is the number of observations in the \(j\)th bin. - The confidence level \(\alpha\) (to be chosen).

The following is given without proof.

Theorem 6.3
As \(n\rightarrow\infty\), \(T\) converges, in distribution, to the chi-square distribution \(\chi_{k-1}\) with \(k-1\) degrees of freedom.

6.4.2 The Kolmogorov–Smirnov Test

Another simple test uses the empirical distribution function of the sample. The Kolmogorov–Smirnov test is based on the following intuition: If the sample is uniformly distributed, the deviation of the empirical distribution function from the theoretical distribution function, as given in Lemma 4.2, should be small.

Definition 6.4
If \(x=(x_1,\dots,x_n)\) is a sample, then the empirical distribution function of \(x\) is given by \[ F_n(x) = \frac{1}{n}\sum_{i=1}^n \mathbb{1}_{(-\infty,x)}(x_i), \quad x \in \mathbb{R}. \]

The Kolmogorov–Smirnov test for uniformity is constituted by the following: - The null hypothesis \(H_0\): The sample is from a given distribution \(F\) against the alternative hypothesis \(H_a\) that it is not. - The test statistic \[ D_n=\sup_{x\in\mathbb R}\vert F_n(x)-F(x)\vert, \] where \(n\) is the sample size. - The confidence level \(\alpha\) (to be chosen).

As \(n\rightarrow\infty\), \(\sqrt{n} D_n\) converges to \[ \sup_{t\in\mathbb R}\vert B(F(t))\vert, \] in distribution, where \(B\) is a Brownian bridge (i.e., the quantity \(\sup_{t\in\mathbb R}\vert B(F(t))\vert\) is a random variable).

For \(F(t)=t\), the uniform distribution, the critical values of the Kolmogorov statistics \(D_n\) are known. For large \(n\), the statistics converge in distribution to the so-called Kolmogorov distribution, which satisfies \[ K=\sup_{t\in [0,1]}\vert B(t)\vert. \] In fact, it can be shown that \[ \mathbb P[K\leq x]=1-2\sum_{k=1}^\infty (-1)^{k-1}e^{-2k^2x^2}, \quad x \in \mathbb{R}^+. \]

6.5 Stochastic Processes